Olympic Data

1 Introduction

Team 010100 are the following members: Obumneke Amadi, Izzy Illari, Lucia Illari, Omar Qusous, and Lydia Teinfalt. You may find our work over on GitHub.

With the 2020 Olympics beginning this July in Tokyo we felt that a relevant discussion to have would be What makes an Olympian? What can we say about Olympians? Have there been any general trends amongst Olympians? What does the Olympic population look like? These questions are all suited to EDA, and with these questions in mind we went to see if we could find data on Olympians that would be readily available for us to analyze. Eventually our question morphed into the following: are there any specific characteristics (i.e. age, weight, height, BMI, country of origin) that could be used to describe an Olympian in general?

We were able to find a dataset called 120 years of Olympic history: athletes and results on Kaggle over here: https://www.kaggle.com/heesoo37/120-years-of-olympic-history-athletes-and-results. This historical dataset includes all Olympic Games from Athens 1896 to Rio 2016, which was scraped from https://www.sports-reference.com/. This data was compiled by a group of Olympic historians and statisticians. All of these individuals are members of the International Society of Olympic Historians (ISOH) and have been working on this project since the late 1990s.

The report is organized as follows:

  1. Summary of Dataset
  2. Description of Data/Descriptive Stats
  3. BMI of Olympic Athletes
  4. Geographical Data
  5. Name Data
  6. Changes in Weight/Height over the Decades
  7. Age Data

2 Summary of Dataset

The data looks like the following:

'data.frame':   271116 obs. of  15 variables:
 $ ID    : int  1 2 3 4 5 5 5 5 5 5 ...
 $ Name  : Factor w/ 134732 levels "  Gabrielle Marie \"Gabby\" Adcock (White-)",..: 8 9 44318 29412 21469 21469 21469 21469 21469 21469 ...
 $ Sex   : Factor w/ 2 levels "F","M": 2 2 2 2 1 1 1 1 1 1 ...
 $ Age   : int  24 23 24 34 21 21 25 25 27 27 ...
 $ Height: int  180 170 NA NA 185 185 185 185 185 185 ...
 $ Weight: num  80 60 NA NA 82 82 82 82 82 82 ...
 $ Team  : Factor w/ 1184 levels "30. Februar",..: 199 199 273 278 705 705 705 705 705 705 ...
 $ NOC   : Factor w/ 230 levels "AFG","AHO","ALB",..: 42 42 56 56 146 146 146 146 146 146 ...
 $ Games : Factor w/ 51 levels "1896 Summer",..: 38 49 7 2 37 37 39 39 40 40 ...
 $ Year  : int  1992 2012 1920 1900 1988 1988 1992 1992 1994 1994 ...
 $ Season: Factor w/ 2 levels "Summer","Winter": 1 1 1 1 2 2 2 2 2 2 ...
 $ City  : Factor w/ 42 levels "Albertville",..: 6 18 3 27 9 9 1 1 17 17 ...
 $ Sport : Factor w/ 66 levels "Aeronautics",..: 9 33 25 62 54 54 54 54 54 54 ...
 $ Event : Factor w/ 765 levels "Aeronautics Mixed Aeronautics",..: 160 398 349 710 623 619 623 619 623 619 ...
 $ Medal : Factor w/ 3 levels "Bronze","Gold",..: NA NA NA 2 NA NA NA NA NA NA ...

The athlete events data has 15 columns and 271116 rows/entries, for a total of 4066740 individual data points. In athelete_events each row corresponds to an individual athlete competing in an individual Olympic event. The variables are the following:

  1. ID: Unique number for each athlete
  2. Name: Athlete’s name
  3. Sex: M or F
  4. Age: Integer
  5. Height: centimeters
  6. Weight: kilograms
  7. Team: Team name
  8. NOC: National Olympic Committee 3-letter code
  9. Games: Year and season
  10. Year: Integer
  11. Season: Summer or Winter
  12. City: Host city
  13. Sport
  14. Event
  15. Medal: Gold, Silver, Bronze, or NA

To prepare our data for EDA we dropped the Olympic event: Art Sculpting. NAs were also removed.

3 Description of Data/Descriptive Stats

We can look at the top 10 events by number of athetes participating in these events. We can show this in a table or in a bar chart.

        sport.names sport.counts
Var1.55    Swimming         2486
Var1.44      Rowing         2104
Var1.31  Ice Hockey         1301
Var1.30      Hockey         1168
Var1.28  Gymnastics         1161
Var1.23     Fencing         1109
Var1.25    Football         1084
Var1.15    Canoeing         1041
Var1.9   Basketball         1000
Var1.66   Wrestling          967

In list form:

  1. Swimming
  2. Rowing
  3. Ice Hockey
  4. Hockey
  5. Gymnastics
  6. Fencing
  7. Football
  8. Canoeing
  9. Basketball
  10. Wrestling

4 BMI of Olympic Athletes

5 Graphical Representation of Data

6 Geographical Data

7 Olympic Name Data

9 Age Data

Medal mean
Gold 25.9
Silver 26.0
Bronze 25.9

It appears that the mean age that an Olympic Medalist wins a medal is around 26 years old. We can also look at the ages of medal-winning athletes separated by the Summer and Winter Games.

From the plots we can see that between WW1 and WW2 the average age of medalists is decreasing, but after WW2 the average age temporarily rose. We see that the age begins to decrease until 1980 but then rises again after 1980. The age seems to plateau in the 2010s.

When we look at the medal-winning athletes by gender we see that in general men get medals at older ages than women do.

Now we can look at the trends amongst the athletes in the Winter Games.

It seems that there are fewer peaks and dips than in the Summer Games data, where the Winter Athletes seem to have a smaller variance in age. We can look at the Summer and Winter Games directly.

It appears that after the 1950s the athletes at the Winter Games, on average, are older. Both Summer and Winter Games experience an upward trend in ages after the 1980s.

We can also separate the Winter Game data by gender, as we did for the Summer Games.

Just as with the Summer Games we see that, on average, male athletes tend to be older than female athletes.

10 Discussion & Conclusion

team 010100

06 March, 2020